Indexed Map-Reduce Join Algorithm

نویسنده

  • Mohamed Helmy Khafagy
چکیده

Map Reduce is used to handle and support massive data sets .rabidly increasing in data size, and big data are imperative today to make an analysis of this data. Map-Reduce gets more helpful information by using two simple functions map and reduce with load balancing, fault tolerance, and high scalability .the most important operation in the analysis process is join. This paper explains new two-way join algorithm called Indexed Map Reduce Join Algorithm that used Index in the large table to Decrease I/O and Shuffling that cause Best performance in Map Reduce Join. Our experimental result shows that using Index-join algorithm has high performance than other algorithms while increasing the data size from 100 million records to 500 million without memory overflow.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework

he Map/Reduce framework-a parallel processing paradigm-is widely being used for large scale distributed data processing. Map/Reduce can perform typical relational database operations like selection, aggregation, and projection etc. However, binary relational operators like join, cartesian product, and set operations are difficult to implement with Map/Reduce. Map/Reduce can process homogeneous ...

متن کامل

Runtime Optimization of Join Location in Parallel Data Management Systems

Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage nodes, corresponding to reduce side joins, or by fetching data from the storage system to compute nodes, corresponding to map side join. Both m...

متن کامل

A Comparative Analysis of Join Algorithms Using the Hadoop Map/Reduce Framework

The Map/Reduce framework is a programming model recently introduced by Google Inc. to support distributed computing on very large datasets across a large number of machines. It provides a simple but yet powerful way to implement distributed applications without having deeper knowledge of parallel programming. Each participating node executes Map and/or Reduce tasks which involve reading and wri...

متن کامل

A Scalable and Skew-insensitive Algorithm for Join Operations using Map/Reduce Model

For over a decade, Map/Reduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges : data skew, task imbalance, high disk i/o and redistribution costs can have disastrous effects on...

متن کامل

A Unified Approach for Indexed and Non-Indexed Spatial Joins

Most spatial join algorithms either assume the existence of a spatial index structure that is traversed during the join process,or solve the problem by sorting, partitioning, or on-the-fly index construction. In this paper, we develop a simple plane-sweeping algorithm that unifies the index-based and non-index based approaches. This algorithm processes indexed as well as non-indexed inputs, ext...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015